With the increasing ability of large language models (LLMs), in-context learning (ICL) has become a new paradigm for natural language processing (NLP), where LLMs make predictions only based on contexts augmented with a few training examples. It has been a new trend exploring ICL to evaluate and extrapolate the ability of LLMs. In this paper, we aim to survey and summarize the progress, challenges, and future work in ICL. We first present a formal definition of ICL and clarify its correlation to related studies. Then, we organize and discuss advanced techniques of ICL, including training strategies, prompting strategies, and so on. Finally, we present the challenges of ICL and provide potential directions for further research. We hope our work can encourage more research on uncovering how ICL works and improving ICL in future work.
translated by 谷歌翻译
Frame Semantic Role Labeling (FSRL) identifies arguments and labels them with frame semantic roles defined in FrameNet. Previous researches tend to divide FSRL into argument identification and role classification. Such methods usually model role classification as naive multi-class classification and treat arguments individually, which neglects label semantics and interactions between arguments and thus hindering performance and generalization of models. In this paper, we propose a query-based framework named ArGument Extractor with Definitions in FrameNet (AGED) to mitigate these problems. Definitions of frames and frame elements (FEs) in FrameNet can be used to query arguments in text. Encoding text-definition pairs can guide models in learning label semantics and strengthening argument interactions. Experiments show that AGED outperforms previous state-of-the-art by up to 1.3 F1-score in two FrameNet datasets and the generalization power of AGED in zero-shot and fewshot scenarios. Our code and technical appendix is available at https://github.com/PKUnlp-icler/AGED.
translated by 谷歌翻译
通过深度学习技术的开花,完全有监督的基于骨架的动作识别取得了巨大进步。但是,这些方法需要足够的标记数据,这不容易获得。相比之下,基于自我监督的骨骼的动作识别引起了更多的关注。通过利用未标记的数据,可以学会更多可概括的功能来减轻过度拟合的问题并减少大规模标记的培训数据的需求。受到MAE的启发,我们提出了一个空间式蒙面的自动编码器框架,用于基于3D骨架的自我监管的动作识别(Skeletonmae)。在MAE的掩蔽和重建管道之后,我们利用基于骨架的编码器变压器体系结构来重建蒙版的骨架序列。一种新颖的掩蔽策略,称为时空掩蔽,是根据骨架序列的联合级别和框架级别引入的。这种预训练策略使编码器输出可推广的骨骼特征具有空间和时间依赖性。给定未掩盖的骨架序列,编码器用于动作识别任务。广泛的实验表明,我们的骨架达到了出色的性能,并优于NTU RGB+D和NTU RGB+D 120数据集的最新方法。
translated by 谷歌翻译
视觉变压器(VIT)最近在一系列计算机视觉任务中占据了主导地位,但训练数据效率低下,局部语义表示能力较低,而没有适当的电感偏差。卷积神经网络(CNNS)固有地捕获了区域感知语义,激发了研究人员将CNN引入VIT的架构中,以为VIT提供理想的诱导偏见。但是,嵌入在VIT中的微型CNN实现的位置是否足够好?在本文中,我们通过深入探讨混合CNNS/VIT的宏观结构如何增强层次VIT的性能。特别是,我们研究了令牌嵌入层,别名卷积嵌入(CE)的作用,并系统地揭示了CE如何在VIT中注入理想的感应偏置。此外,我们将最佳CE配置应用于最近发布的4个最先进的Vits,从而有效地增强了相应的性能。最后,释放了一个有效的混合CNN/VIT家族,称为CETNET,可以用作通用的视觉骨架。具体而言,CETNET在Imagenet-1K上获得了84.9%的TOP-1准确性(从头开始训练),可可基准上的48.6%的盒子地图和ADE20K上的51.6%MIOU,从而显着提高了相应的最新态度的性能。艺术基线。
translated by 谷歌翻译
框架语义解析是一项基本的NLP任务,由三个子任务组成:框架标识,参数识别和角色分类。以前的大多数研究都倾向于忽略不同子任务与论点之间的关系,并且很少关注Framenet中定义的本体论框架知识。在本文中,我们提出了一个带有双层(KID)的知识引导的增量语义解析器。我们首先介绍框架知识图(FKG),这是一个构建框架知识上构建的框架和FES(帧元素)的异质图,以便我们可以得出框架和FES的知识增强表示。此外,我们提出了框架语义图(FSG)来表示用图形结构从文本中提取的框架语义结构。通过这种方式,我们可以将框架语义解析转变为增量图构造问题,以加强子任务之间的相互作用和参数之间的关系。我们的实验表明,在两个Framenet数据集上,KID的表现优于先前的最新方法1.7 f1得分。我们的代码可在https://github.com/pkunlp-icler/kid上使用。
translated by 谷歌翻译
基于深度学习的人网格重建方法具有构建更大网络的趋势,以实现更高的准确性。尽管是人网格重建模型的实际使用的关键特征,但往往忽略了计算复杂性和模型大小(例如,虚拟试用系统)。在本文中,我们呈现GTR,这是一种基于轻量级的姿势的方法,可以从2D人类姿势重建人网。我们提出了一种姿势分析模块,它使用曲线图形是利用结构化和隐式的关节相关性,以及将提取的姿势特征与网格模板组合以重建最终人体网格的网格回归模块。我们通过对人类3.6M和3DPW数据集进行广泛的评估,展示了GTR的效率和泛化。特别是,GTRS比SOTA姿势的方法POSE2MESH实现了更好的精度,同时仅使用10.2%的参数(PARAMS)和2.5%的跨越式3DPW数据集。代码将公开。
translated by 谷歌翻译
Masked image modeling (MIM) performs strongly in pre-training large vision Transformers (ViTs). However, small models that are critical for real-world applications cannot or only marginally benefit from this pre-training approach. In this paper, we explore distillation techniques to transfer the success of large MIM-based pre-trained models to smaller ones. We systematically study different options in the distillation framework, including distilling targets, losses, input, network regularization, sequential distillation, etc, revealing that: 1) Distilling token relations is more effective than CLS token- and feature-based distillation; 2) An intermediate layer of the teacher network as target perform better than that using the last layer when the depth of the student mismatches that of the teacher; 3) Weak regularization is preferred; etc. With these findings, we achieve significant fine-tuning accuracy improvements over the scratch MIM pre-training on ImageNet-1K classification, using all the ViT-Tiny, ViT-Small, and ViT-base models, with +4.2%/+2.4%/+1.4% gains, respectively. Our TinyMIM model of base size achieves 52.2 mIoU in AE20K semantic segmentation, which is +4.1 higher than the MAE baseline. Our TinyMIM model of tiny size achieves 79.6% top-1 accuracy on ImageNet-1K image classification, which sets a new record for small vision models of the same size and computation budget. This strong performance suggests an alternative way for developing small vision Transformer models, that is, by exploring better training methods rather than introducing inductive biases into architectures as in most previous works. Code is available at https://github.com/OliverRensu/TinyMIM.
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译
Benefiting from the intrinsic supervision information exploitation capability, contrastive learning has achieved promising performance in the field of deep graph clustering recently. However, we observe that two drawbacks of the positive and negative sample construction mechanisms limit the performance of existing algorithms from further improvement. 1) The quality of positive samples heavily depends on the carefully designed data augmentations, while inappropriate data augmentations would easily lead to the semantic drift and indiscriminative positive samples. 2) The constructed negative samples are not reliable for ignoring important clustering information. To solve these problems, we propose a Cluster-guided Contrastive deep Graph Clustering network (CCGC) by mining the intrinsic supervision information in the high-confidence clustering results. Specifically, instead of conducting complex node or edge perturbation, we construct two views of the graph by designing special Siamese encoders whose weights are not shared between the sibling sub-networks. Then, guided by the high-confidence clustering information, we carefully select and construct the positive samples from the same high-confidence cluster in two views. Moreover, to construct semantic meaningful negative sample pairs, we regard the centers of different high-confidence clusters as negative samples, thus improving the discriminative capability and reliability of the constructed sample pairs. Lastly, we design an objective function to pull close the samples from the same cluster while pushing away those from other clusters by maximizing and minimizing the cross-view cosine similarity between positive and negative samples. Extensive experimental results on six datasets demonstrate the effectiveness of CCGC compared with the existing state-of-the-art algorithms.
translated by 谷歌翻译
As one of the prevalent methods to achieve automation systems, Imitation Learning (IL) presents a promising performance in a wide range of domains. However, despite the considerable improvement in policy performance, the corresponding research on the explainability of IL models is still limited. Inspired by the recent approaches in explainable artificial intelligence methods, we proposed a model-agnostic explaining framework for IL models called R2RISE. R2RISE aims to explain the overall policy performance with respect to the frames in demonstrations. It iteratively retrains the black-box IL model from the randomized masked demonstrations and uses the conventional evaluation outcome environment returns as the coefficient to build an importance map. We also conducted experiments to investigate three major questions concerning frames' importance equality, the effectiveness of the importance map, and connections between importance maps from different IL models. The result shows that R2RISE successfully distinguishes important frames from the demonstrations.
translated by 谷歌翻译